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Abstract. The ensemble Kalman filter (EnKF) and ensemble square root filter (ESRF) 
are data assimilation methods used to combine high dimensional, nonlinear dynamical 
models with observed data. Despite their widespread usage in climate science and oil 
reservoir simulation, very little is known about the long-time behavior of these methods 
and why they are effective when applied with modest ensemble sizes in large dimensional 
turbulent dynamical systems. By following the basic principles of energy dissipation and 
controllability of filters, this paper establishes a simple, systematic and rigorous framework 
for the nonlinear analysis of EnKF and ESRF with arbitrary ensemble size, focusing on 
the dynamical properties of boundedness and geometric ergodicity. The time uniform 
boundedness guarantees that the filter estimate will not diverge to machine infinity in 
finite time, which is a potential threat for EnKF and ESQF known as the catastrophic 
filter divergence. Geometric ergodicity ensures in addition that the filter has a unique 
invariant measure and that initialization errors will dissipate exponentially in time. We 
establish these results by introducing a natural notion of observable energy dissipation. The 
time uniform bound is achieved through a simple Lyapunov function argument, this result 
applies to systems with complete observations and strong kinetic energy dissipation, but also 
to concrete examples with incomplete observations. With the Lyapunov function argument 
established, the geometric ergodicity is obtained by verifying the controllability of the filter 
processes; in particular, such analysis for ESQF relies on a careful multivariate perturbation 
analysis of the covariance eigen-structure. 


1. Introduction 

An important problem in scientific computing is the effective assimilation of observational 
data with high dimensional nonlinear forecast models. The classical filtering tools, such as 
the Kalman filter and extended Kalman filter, are poorly suited to these problems, due both 
to the nonlinearity of the models and the cost of computing covariance matrices for high 
dimensional state vectors. The ensemble Kalman filter (EnKF) and ensemble square root 
filter (ESRF) were designed to overcome these difficulties [1, 2, 3, 4], The basic idea of these 
methods is to propagate an ensemble { Vn ' } ,..., Vn K> } to describe the forecast distribution 
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of the underlying system U n , and then assimilating the new observation via a Kalman-type 
update using the ensemble mean and covariance. The state estimate remains useful even 
when the ensemble size is several orders of magnitude smaller than the state dimension, 
leading to a considerable benefit in computational cost. Due to their efficiency, EnKF 
and ESRF are broadly used, notably in ocean-atmosphere science [5, 4] and oil reservoir 
simulations [6]. 

Despite the ubiquitous application of EnKF and ESRF, little is known of their dynamical 
behavior beyond that provided by numerical experiments. Existing theoretical studies of 
EnKF and ESRF focus mainly on either error estimation of one single assimilation step 
[7, 8], or in the case of linear model dynamics, convergence to the classical Kalman filter 
as the ensemble size tends to infinity [9, 10]. The aim of this article is to address a more 
practical scenario, namely by looking at the long-time behavior of the ensemble when the 
ensemble size is fixed and where the underlying model is nonlinear. To be specific, we seek 
to address the following questions: 

(i) Under what model conditions does the ensemble remain bounded on an infinite time 
horizon? 

(ii) Are the filter processes ergodic and how quickly do they lose memory of initial 
conditions? 

These two questions are of great practical importance. Boundedness of the ensemble 
prohibits the state estimate from diverging to infinity, thereby precluding the disastrous 
phenomena of catastrophic filter divergence [11, 12, 13, 4], Ergodicity of the ensemble 
ensures that errors in the initialization of the filter will not affect the performance of the 
filter in the long run [14, 15, 16], geometric ergodicity further insures that the error will 
dissipate exponentially fast in time. To the best of our knowledge, the only article in a 
similar setting is [17], where the authors show wcll-posedness of EnKF with bounds which 
can grow exponentially in time and accuracy under variance inflation. 

In the study of dynamical systems, boundedness can be demonstrated through the 
construction of a Lyapunov function £, which is a positive function statisfying the dissipation 
criterion 

E n _i 8(U n ) < (1 - P)£{U n -i) + K . (1.1) 

Here E ra _! denotes the conditional expectation with respect to the information at time n — 1, 
and 0 < ft < 1 is a constant. Using the discrete Gronwall inequality, we immediately find 
that 

E£(£4) < (1 - /?) n E£(Uo) + Kfi~\ 

which shows that, under expectation of the Lyapunov function £, the state U n is bounded 
uniformly in n. In geophysically relevant models, such as the Lorenz equations and Navier- 
Stokes equations, the corresponding S can be chosen as the kinetic energy £(■) = \ • j 2 . In 
this scenario, the relation (1.1) is known as an energy principle. 
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A natural strategy for proving boundedness of EnKF and ESRF is to check whether the 
energy principles of the nonlinear system are inherited by the ensemble. In other words, if 
£ is a Lyapunov function of the original system U n , can we use £ to construct a Lyapunov 
function for the ensemble processes {E,P' l }f =1 . 

In Theorems 3.2 and 3.3 we will show that this construction is quite straight-forward, 
provided that the underlying model satisfies a so-called observable energy criterion. To 
be specific, suppose the model is observed linearly via Z n = HU n + ( n , then the observable 
energy criterion states that the underlying model satisfies (1.1) with the choice £{u ) = \Hu\ 2 , 
that is 

^n-i\HU n \ 2 < (1 - (3)\HU n _i\ 2 + K . (1.2) 

Linder this assumption, it is shown in Theorems 3.2, 3.3 that the ensemble satisfies 

a related energy principle. Hence, if the model satisfies (1.2) with full rank H, then the 
ensemble {V,^} must remain bounded on an infinite time horizon. When H is not of full 
rank, one still obtains an energy principle from (1.2), but can only conclude boundedness of 
the observable ensemble {HV^}fi =1 . 

With a Lyapunov function established, the EnKF and ESRF are shown to be 
geometrically ergodic by Theorems 5.5, 5.6 and 5.8, assuming the nonlinear system is 
propagated with non-degenerate noise. The proofs are conceptually simple, as it suffices 
to check to the controllability of the Liters, thanks to the classical work of [18, 19, 20]. 
The only technical challenge lies in the eigenvalue decomposition of matrices required in 
the assimilation step of ESRF. This can be resolved by a careful multivariate perturbation 
analysis of the underlying matrices. 

With a short discussion in Section 4, we will demonstrate a few sufficient conditions that 
imply the observable energy criterion (1.2). When H is of full rank, the observable energy 
criterion holds provided that the dynamics have an energy principle with strong contraction 
parameter, depending on the condition number of H . When H is not of full rank, the 
observable energy criterion does not hold in general, but is verifiable in several concrete 
examples through direct calculation. This dichotomy of observational rank agrees with 
known numerical evidence, where the ensemble behaves stably when full rank observations 
are available, but can experience Liter divergence when the observations are sparse [12, 4], 
or even reach machine inhnity in Lnite time, which is known as catastrophic Liter divergence 
[11, 4, 13]. Using the same philosophy in this paper, the authors have found a concrete 
dynamical system that satisLes the kinetic energy criterion but not the observable energy 
criterion, and whose EnKF ensemble (provably) experiences catastrophic Liter divergence 
with large probability. The authors have also found a general adaptive covariance inLation 
scheme, which always insures that the ensemble remains bounded on an inhnite time horizon, 
without hurting the accuracy of original Liters. These results will be reported in two separate 
papers [21, 22], 

The article is structured as follows. In Section 2.1 we formulate the EnKF and ESRF 
methods and also introduce the notion of Lyapunov functions and energy principles. In 
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Section 3 we establish a simple framework to verify energy principles for EnKF and ESRF 
using the observable energy. Section 4 discusses the applicability of this framework by 
studying a few sufficient conditions that guarantee the observable energy condition. In 
Section 5 we prove the geometric ergodicity of the filter processes assuming the stability 
results in Section 3 hold. In Section 6 we conclude this paper and discuss possible extensions. 

2. Models setup and fundamental concepts 

2.1. Model setup 

In this paper, we assume the signal sequence U n G is generated through a nonlinear 
mapping plus a mean zero noise ( n , and the observation is a linear one plus some mean 
zero noise in M 9 : 

U n = ^ h (U n -i) + f n , Z n = HU n + f n . (2-1) 

Here {£ n } is an i.i.d. noise sequence, and ( n is independent of <^ 1 ,...^ n _ 1 conditioned on 
the realization of U n -\. In many cases, the model may be generated by the solution of a 
stochastic differential equation (SDE) 

du t = ip(u t )dt + E dW t (2.2) 

by taking U n = u n h for some fixed h > 0. A short discussion of this discrete time formulation 
is attached in Appendix B. 

At this stage, we impose no restrictions on ( n except that it is mean zero with a 
conditional covariance depends on U n - p 

E(C n |£4_i) = 0, E(£ n ® ( n \U n -i) = Rh(U n - 1). (2.3) 

As for the observation part, we will assume in this paper that 

rank(R) = q < d, E(£ n |£/ n _i) = 0, E(f n (8) f n \U n -i) = I q - 

The seemingly restrictive choice of observational noise covariance can be made without loss 
of generality. Indeed, any observational covariance can be reduced to the identity via a 
simple coordinate change on the filtering problem. To apply the results of this article 
to a filtering problem with non-trivial observational covariance, one must first apply the 
coordindate change and then check the assumptions in the new system of coordinates. Details 
are contained in the remark below. 

Remark 2.1. Suppose that f n has a nonsingular covariance matrix T and T~ l ^ 2 H has an 
SVD decomposition T~ l / 2 H = <f>AT T , then we change the coordinate system and consider 

u n = ^ T u n , e~n = $ T r- 1/2 Cn, z n = $ T r~ 1/2 z„ = \u n + £ n . (2.4) 

Hence this change of coordinates also reduces the observation matrix to a diagonal matrix. 
If the observation dimension q is larger than the model dimension d, the last d — q diagonal 



Nonlinear stability and ergodicity of ensemble based Kalman filters 


5 


entries of A are zero, so the last d — q rows of Z n are independent of the signal and useless 
for filtering purpose, which we can ignore and set d = q. Moreover f n will have covariance 
matrix I q . Since all the transformations above are linear and bijective, filtering U n with Z n 
is equivalent to filtering U n with Z n . On the other hand, if the covariance T is singular, then 
certain linear subspace can be observed exactly, and may cause the filtering operation to be 
singular. We do not consider such pathological cases in this paper. 

2.2. Ensemble Kalman filter 

In the standard Kalman filtering theory, the conditional distribution of the signal process U n 
given the observation sequence Z\,, Z n is given by a Gaussian distribution. EnKF inherits 
this idea by using a group of ensembles {V^}^ to represent this Gaussian distribution, 
as the mean and covariance can be taken as the ensemble mean and covariance. The EnKF 
operates very much like a Kalman filter, except its forecast step requires a Monte Carlo 
simulation due to the nonlinearity of the system. In detail, the EnKF is an iteration of 
following two steps, with (for instance) being sampled from the equilibrium measure of 
U n . 

• Forecast step: from the posterior ensemble at time n — 1, {Vn-i}k=n a forecast ensemble 
for time n is generated by 

vp = *,.(rS) + Cf, 

(k) 

where Q. ; are independent samples drawn from the same distribution as Q n . Then the 
prior distribution for time n is described by the ensemble mean and covariance: 

E:=-^Y J (0 ) -Vn)®(Vi k) -V n ). (2,5) 

k= 1 k= 1 

• Analysis step: upon receiving the new observation Z n , random perturbations of it are 
generated by adding fn' 1 : 

z( fe ) — z + £ (fc) 

i S n ? 

where are independent samples drawn from the same distribution as f n . Each 
ensemble member is then updated to 

V W = yW _ C n H T (I + HC n H T )-\HV^ - z™) . 

In summary, the EnKF is generated by the following dynamics 

K n ( fc) = yW - C n H T (I + HC n H T )-\HV^ _ Z W), 

W = + (P, V n :=fj2 W. Z { nl 1 = Zn + 

v k= 1 

Cn ■■= E(k w - E) ® - E). 


( 2 , 6 ) 
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From this formulation, it is clear that the augmented process {U n , V^\ ..., Vn <} } is a Markov 
chain. In the following discussion, we will denote the natural filtration up to time n as 
T n = cr{U m , Vm \ ..., Vm K \ m < n}, and denote the conditional expectation with respect to 
J~ n as E n . 

2.3. Ensemble square root filters 

(k) 

One drawback of EnKF comes from its usage of artificial noise fn , as this introduces 
unnecessary sampling errors, particularly when the ensemble size is small [23]. The 
motivation behind the artificial noise is to make the posterior ensemble covariance 

c„ := - F ") ® N3 - W)t v„ := i V™, 

k =1 k =1 

satisfy the covariance update of the standard Kalman filter 

C n = C n - C n H T {H T C n H + iy l HC n , (2.7) 

when the left hand is averaged over [24, 25, 7] . ESRFs, including the ensemble transform 
Kalman filter (ETKF) and the ensemble adjustment Kalman hlter (EAKF), aim to resolve 
this issue by manipulating the posterior spreads to ensure that (2.7) holds. Both ETKF and 
EAKF algorithms are described by the following update steps, with the only difference 
occurring in the assimilation step for the spread. As with EnKF, the initial ensemble 
{} k = i is (for instance) sampled from the equilibrium distribution of U n . 

• Forecast step: identical to EnKF, the forecast ensembles at time n is generated from 
posterior ensembles at time n — 1: 

= 'U(kS)+ c?>. 

The forecast ensemble covariance C n is then computed using (2.5). 

• Assimilation step for the mean: upon receiving the new observation Z n , the posterior 
ensemble mean is updated through 

Vn = v n - C n H T (I + HC n H T )-\HV n - Z n ), V n = -J2 (2-8) 

Y k =1 

• Assimilation step for the spread: The forecast ensemble spread is given by the d x K 

matrix _ _ 

Sn = [C„ (1) ~ C„,. . ., l/W _ V n ] . 

To update the spread, first hnd a matrix T n e W dxd (for ETKF) or A n e W RxK (for 
EAKF) such that 

YZrJnSn ® TS n = j^jS n A n ® S n A n = d n - d n H T (H T d n H + I)~ l HC n . (2.9) 
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The posterior spread is updated to S n = T n S n (for ETKF) or S n = S n A n (EAKF), and 
the ensemble members are updated to 

= v n + s<‘>, 

where denotes the fc-th column of the updated spread matrix S n . By construction, 
the posterior covariance C n = (K — 1 )~ 1 SfiS n satisfies (2.7). 

At this stage it suffices to know that such A n and T n exist, their finer properties play 
no role in the discussion concerning stability. Their formulation will become important 
when we want to study ergodicity in Section 5, and a detailed formulation will be given 
there. Based on our description above, the augmented process {U n , Vn l \ ■ ■ ., Vn K ^} is 
again a Markov chain. As above, we will denote the natural filtration up to time n as 
T n = cr{U m , Vm \ ..., Vm X \m < n}, and denote the conditional expectation with respect to 
J~ n as E n . 

2-4- Covariance inflation 

When applying EnKF and ESRF, the forecast ensemble covariance C n often underestimates 
the uncertainty in the forecast model. An ad hoc solution is to use inflated or modified 
forecast ensemble covariance in the assimilation step. We will discuss three types of such 
methods in this paper: 

• Additive inflation: replace C n with C n + XI for a proper A > 0; 

• Uniform inflation: replace C n with (1 + A )C n for a proper A > 0. 

It should be noted that additive inflation is only used in EnKF and not in the square root 
filters since it is not clear how an additive inflation should be applied at the level of the 
matrix square root. There are other ad hoc ways of modifying ESRF methods with additive 
inflation, see page 147 of [4] for more details. 

2.5. Energy principles and Lyapunov functions 

Stability for nonlinear systems can be studied through energy principles. That is, certain 
types of energy are preserved or dissipated by the dynamics. In a stochastic setting, this 
idea is formalized using Lyapunov functions. In this paper, we say that E is a Lyapunov 
function for a Markov chain X n if there exists positive constants 0 < fl < 1 and K such that 

E(£(AA)|X n _i) < (1 - fl)E(X n _,) + K (2.10) 

for all n G Z + . For a continuous time Markov process X t with generator £, the previous 
relation is replaced by 


CE{x) < —fl£(x) + K , 


(2.11) 
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where we only require fi, K >0. As a simple consequence of Gronwall’s inequality, the 
existence of a Lyapunov function implies (respectively) that 

E £(X n ) < (1 - f3) n ESX 0 + K//3, or E £{X t ) < e~^E£X 0 + K//3. (2.12) 

In other words, E £(X n ) (or respectively E S(X t )) can be bounded uniformly in time. In this 
case, we say that X n (or X t ) is £-bounded. 

When the sub-level sets of £ are compact, we will call £ a strong Lyapunov function. 
This additional requirement implies that an ^-bounded Markov chain revisits a large enough 
compact set arbitrarily many times and that the distribution X n forms a tight sequence. 
Existence of strong Lyapunov functions will be crucial when proving geometric ergodicity. 

In this paper, we will assume the kinetic energy of the process, £(■) = | • | 2 , is a strong 
Lyapunov function. Based on our formulation of the random sequence U n , this is equivalent 
to the following. 

Assumption 2.2 (Kinetic energy principle). There exist constants 0 < fih < 1, iW > 0, 
such that 

|T ft (u)| 2 + tr{R h {u)) < (1 - fi h )\u\ 2 + K h , 

for all where Rh is the conditional covariance of the system noise defined in (2.3). 

When the random sequence U n is generated from discrete time solutions of an SDE u t , 
this kinetic energy principle can be verihed directly by computing the generator (2.11) or 
simply checking the drift of the SDE, see Appendix B for more details. LTsing this convenient 
argument, we can easily verify that the following examples all satisfy Assumption 2.2. 

Example 2.3 (Stochastic turbulence models). When U n is a time discretization of the SDE 
(2.2), it suffices to require that for certain (3, K > 0 

C\u\ 2 = ■ u + Ar(EE t ) < -/ 3\u \ 2 + K, (2.13) 

since then Assumption 2.2 would hold with — l — e~^ h ,Kh = Kh. Relation (2.13) holds 
for many stochastic turbulence models, which generally take the form 

d,U t = -DU t dt + B(U t )dt + / + EdW t . 

The linear operator D represents damping, so its symmetric part \{D T + D) is positive 
semidefinite. The nonlinear interaction term B is energy preserving, with ([U t ,B(U t )) = 0. 
Then it is easy to verify that (2.13) holds for /3 = |A mm (D T + D ) and K = \f\ 2 //3. For 
more information on these models and their application to turbulence, see [26, 4, 27], 

In the following, we present two well known turbulent systems that all satisfy relation 
(2.13). Hence, Assumption 2.2 holds for their discrete time formulation. 
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Example 2.4 (Lorenz 96). Let U t = ..., UN,t) be an N > 4 (usually N = 40) 

dimensional system, with its dynamics given by 

h J i,t ^i,t 4" F , 

with the periodic boundary condition u t}k = Ut.,k-N = Ut,k+N for all k and where F is a 
constant forcing. One can easily show that 

N 

|C/*| 2 = -2\U t \ 2 + 2Fj2^i,t < -\U t \ 2 + NF 2 . 

i —1 

Example 2.5 (Truncated stochastic Navier-Stokes system). The incompressible stochastic 
Navier-Stokes equation on a two dimensional torus can be described through the vorticity 
field 

dv t = uA v t dt — B(JCv t , v t )dt + E <J k e k dW ktt . 

k&I? 

Here e k is the Fourier basis for square integrable functions on the torus, so that e k (x ) = e lk ' x , 
/C is the linear Biot-Savart integral operator, mapping e k to e k ik L fikSf, B is the advection 
effect B(u,v ) := (u ■ V)v and (W k ,t)kez 2 is a sequence of independent Wiener processes. It 
is well known that for this process the L 2 -norm is dissipative in time. 

For practical numerical implementation, one needs to truncate the infinite dimensional 
object v t . For example, one can ignore the high Fourier modes and assume the truncated v t 
has the following Fourier decomposition: 

rt ^ ^ r k ^e k T v k ^C— k . 

kei 

Here * denote complex conjugacy, and 

I = {k = (k\, k 2 ) € Z 2 : \k\ < N, k 0, arg(k\ + k 2 i) G [0,7r)}. 

Note by formulation, v t is real valued, and the dynamics of v kjt can be specified by the 
dynamics of v k ,t as follow: 

dv k ,t = -v\k\ 2 v kjt dt - P k B(JCv t ,v t )dt + a k dW ktt . 

where P^ : v i-> (v, e k ) evaluates the k-th Fourier coefficient. Then the full dynamics 
U t = {v k ,t)kei follows a energy principle: 

£\ u t\ 2 = + - ~ 2v \ u ^+J2 a l 

k£l k£l k£l k€l 

Here we used the identity (v, B(ICv, v)) = 0 7 so 

^u fcjf P kB(JCv t ,v t ) = (^Vkjek, B(JCv t ,v t )) = B(JCv t , v t )) = 0. 
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In some other nonlinear models, the stability can be demonstrated only after first 
applying a linear coordinate change. 

Example 2.6 (Lorenz 63). Let U t = ( x t ,yt,z t ) be a three dimensional system following an 
ordinary differential equation (ODE): 


= a(y t - x t ), j^yt = x t (r - z t ) 


d / 
y ti = DUt - bz t . 


Then we can define 

m) 

so using Young’s inequality and (3 : 


rx 2 t + ay 2 t + a(z t - 2r) 2 , 
min{2, 2a, b}, 


^£(U t ) = -2a(nr 2 + y 2 + b(z t - r) 2 ) + 2 bar 2 

< —2 cr(rx 2 + y 2 + \b(z t — 2 r) 2 ) + 4krr 2 

< -fiEfUt) + Abar 2 . 


One should note here that \U t \' 2 does not satisfy relation (2.13) for all choices of parameters. 

Section 4 will have a detailed discussion of this type of Lyapunov function, where it will 
be shown that the Lyapunov dissipation relation (2.10) can be preserved through constant 
shifts for quadratic functions, but not through linear transformations in general. 


3. Bounding the observable energy 

In the analysis of ensemble Kalman Liters, one natural strategy is obtaining a control over 
the configuration of the ensemble members. However, such control is very difficult in general, 
as nonlinear dynamics are known be chaotic and turbulent. In this section, we will discuss 
one type of condition which circumvents this problem. Generally speaking, this condition 
requires that the energy of the observable part, £(■) = \H ■ | 2 , be a Lyapunov function for 
the model U n . In another words, we assume there is a fih £ (0,1) and Kf } > 0 such that 

^ n -i\HU n \ 2 < (1 — f3 h )\HU n _i\ 2 + K h , a.s. 

This condition can be formulated in terms of the propagation equation (2.1). 

Assumption 3.1 (Observable energy criterion). There exists a fih £ (0,1) and a Kh, such 
that 

\HA> h (u)\ 2 + tr{HR h (u)H T ) < (1 - fi h )\Hu\ 2 + K h , 

for all u G M d . Here Rh again is the conditional covariance of the system noise in (2.3). 

The objective of the current section is to show that this property is inherited by 
the ensemble and square root Kalman Liters, so the observable energy of the ensembles, 
Y)k \HVn k, \ 2 has uniformly bounded expectation in time. 
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Under the assumption of full observations H = Li, as made in [7, 17], it is clear that 
Assumption 3.1 is equivalent to Assumption 2.2, hence the observable energy assumption is 
quite natural. When rank(/7) = q = d, the observable energy is actually equivalent to the 
standard kinetic energy, as 

\v\ = ■ Hv | < \(H T H)~ l H T \\Hv\, \Hv\ < \H\\v\. 

However, Assumption 2.2 does not in general imply Assumption 3.1. To achieve this 
implication, one requires that the dissipation in Assumption 2.2 be ‘strong enough’. Section 
4 will provide a detailed discussion of when and how Assumption 3.1 can be verified. 

3.1. Boundedness of the observable energy for EnKF 

The advantage of the observable energy \HU n \ 2 over the kinetic energy \U n \ 2 is that it is 
preserved in the assimilation step (2.6). To see this, left multiply the assimilation equation 
by H and rearrange to obtain 

HV^ k) = (/ + Hd n H T )- ] HV^ + HC n H T {I + HC n H T )~ l Z^. 

The first term on the right can be bounded in terms of HV^\ using Assumption 3.1 and the 
second term can similarly be bounded in terms of \HU n \ 2 and an additive constant. This 
simple observation is the crux of the following result. 

Theorem 3.2. Assume that the signal process U n satisfies the observable energy criterion. 
Assumption 3.1, and let be the EnKF ensemble process. Then 

(i) There exist constants D,M > 0 such that 

E n -i(\HVW\ 2 + M\HU n \ 2 ) < (1 - + M\HU n ^\ 2 ) + D (3.1) 

for each k — 1... K and uniformly in n > 1. In particular, the function 

K 

S(U, V a \ ..., V {K) ) = J2\ HV n ] \ 2 + KM\HU n \ 2 

k= 1 

is a Lyapunov function for the Markov chain (U n , Vn \ ..., Vn K> ) and hence the signal- 
ensemble process is £-bounded. 

(ii) When rank(/7) = q = d, £ is a strong Lyapunov function and 

K 

E|U n | 2 + ^E|U n (fc) | 2 

fc=i 

is bounded uniformly in n > 0. The precise upper bound can be read off directly from 
(2.12) 
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(in) Finally, all the claims above hold for any positive semi-definite choice ofC n , in particular 
any covariance inflation scheme from Section 2.4 satisfies the same relation. 

Proof. Left multiply the first equation of (2.6) with H, 

HV^ ] = HV r (k) - HC n H T (I + Hd n H T )~ 1 (HVi k) - Z^) 

= (/ + Hd n H T )-'HV r [ k) + HC n H T (I + HC n H T )- x Z^. (3.2) 

Then based on elementary Lemma Appendix A.l and Young’s inequality, Lemma Appendix 
A.2 

I HV,Pf 2 < (1 + \P h )\HV^\ 2 + (1 + 2Pp)\ZP\\ 

Using Assumption 3.1 and the conditional independence of Q, ' 

E„-i(|ffU w | 2 ) = + H C«i| 2 ) < (1 - + K h . 

Furthermore, by Young’s inequality 

E^dZ^I 2 ) = E n _fl\HU n + + efl 2 ) < 2E n .fl\HU n \ 2 ) +4 q< 2|i/f/ n _ 1 | 2 + 2 (K h + 2 q). 

Combining these inequalities and using (1 — flhfifi + \flh) < (1 — \flh) we have 
En.fiHV^l 2 < (l~lfl h )\HV^ 1 \ 2 + (2+4l3- 1 )\HU n ^\ 2 +(l + lfl h )K h +2(l+2fl~ 1 )(K h + 2q). 
On the other hand, by Assumption 3.1, for any M > 0 we have 

ME n _i\HU n \ 2 < M(1 - fl h )\HU n ^\ 2 + MK h . 

ffence, by fixing M such that \flhM > (2 A-Afl^ 1 ) and adding the previous two inequalities, 
we see that we can always find a constant D such that 

E n -fl\HV^\ 2 + M\HU n \ 2 ) < (1 - \fl h ){\HV^ k \\ 2 + M\HU n ^\ 2 ) + D. 

This completes the proof of the first claim. The second claim is simply summing the result of 
the first claim over all k. And when H is of rank d, the observable energy \Hv\ 2 is equivalent 
to the square energy |u| 2 

M = \(H t H)~ 1 H t ■ Hv | < \{H T H)~ l H T \\Hv\. 


Finally, notice that we have not used any properties of C n other than it is positive semi- 
definite. □ 
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The boundedness of ESRF ensembles is not too different from EnKF, since the assimilation 
step for mean (2.8) is similar to the assimilation step of EnKF, while the posterior ensemble 
spread in the observable space can be bounded a.s. 

Theorem 3.3. Assume that the signal process U n satisfies the observable energy criterion, 
Assumption 3.1, and let {Vn k ^}(f =1 denote either the EAKF or ETKF ensemble. Then 

(i) The observable posterior covariance HC n H T I d a.s., where 

Cn : = f - F ") ® WT - F ”). 

k= 1 

(ii) There exist constants D, M > 0 such that the ensemble mean V n = Ylk =i satisfies 

En-id HV n \ 2 + M\U n \ 2 ) < (1 - \fl h ){\HV n _ x \ 2 + M\HU n -i\ 2 ) + D , (3.3) 

for all integers n > 2. In particular, the function 

K 

£{U, K (1) ,..., V {K) ) = \ HV n ] \ 2 + K M\HU n \ 2 . 

k= 1 

is a Lyapunov function for the signal-ensemble process (U n , Vh l \ ..., Vn <y ) and hence 
the process is £-bounded. 

(Hi) When rank (H) = d, £ is a strong Lyapunov function and 

K 

E|t/n| 2 + E E K (fe) | 2 

k= 1 

is bounded uniformly in n > 0. The precise bound can be read off directly from (2.12). 

(iv) Again the claims above hold for any choice of positive semi-definite covariance matrix 
C n , in particular the uniform covariance inflation scheme in Section 2.f. 

Proof. From the definition of C n in both ESRF methods, we have that 

HC n H T = HC n H T - Hd n H T (H T C n H + I)- 1 HC n H T 

= HC n H T (H T C n H + I)- 1 (I + HC n H) - HC n H T {H T C n H + I)~ l HC n H T 
= HC n H T {HC n H T + I)~ 1 . 

By Lemma Appendix A.l, we clearly have 0 HC n H T I d . As a consequence, 

K 

E \ HV n k) \ 2 = tr {HCnH 7 ') + K\HV n \ 2 < K\HV n \ 2 + d, 

k= 1 


(3.4) 
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The inequality (3.3) follows almost identically to the proof of (3.1). Indeed, the ensemble 
mean assimilation step (2.8) implies that 

HV n = HV n - HC n H T (I + HC n H T )~\HV n - Z n ) 

= (/ + HC n H T )~ l HV n + HC n H T {l + HC n H T Y x Z n . 

The only difference between this and the proof of (3.1) is that we need to bound E, n _ 1 1 HV n \ 2 , 
but by Jensen’s inequality and (3.4) we have 

— i K 1 _ o K 

Ki-i\HV n \ 2 < -J2 E n-i\HVW\ 2 <^f±J2\ HV n-i\ 2+K h < (l-Ph)\HV n ^\ 2 +d+K h . 

k —1 k =1 

Therefore the argument after (3.2) applies to the process V n verbatim. 

To show that S is a Lyapunov function, it suffices to apply Young’s inequality 

I HV^\ 2 < (1 + \p h )\HV n \ 2 + (1 + 4/Jp)|tfvf’ - HV n | 2 , 


and also see that 

K 

Y, I HVW - HVn | 2 = tr {SlH T HS n ) = tr {HS n S^H T ) = tr(HC n H T ) < tr(/,) < q, 

k= 1 

where S n is the posterior spread matrix, which is given by T n S n or S n A n as in (2.9). Therefore 

E n-iS(U n , VY\ • • •, v^) < (1 + \fi h )K^n-i{\HV n \ 2 + M\HU n \ 2 ) + (1 + 

< (1 - \fi h )K(\HVn-fi 2 + M\HUn.fi 2 ) + KD + (1 + AfiYM 

K 

< (1 - lPh)(Y \ HV n-i? + KAfiHUn-fi 2 ) + KD + (1 + Afi~ l )q 

k= 1 

= (1 - \fSu)W„- 1. dti,.... qb) + KD + (1 + 4 K')q. 

Here we applied Jensen’s inequality inequality in the penultimate step. The proofs for the 
second two claims are identical to Theorem 3.2. □ 

4. Validity of the observable energy criterion 

In Section 3, we have established a series of uniform boundeness results based on the 
observable energy criterion, Assumption 3.1. This criterion is different from the usual energy 
principle for dynamical systems, Assumption 2.2, although they share similar formulations. 
In this section, we will demonstrate a few sufficient conditions that lead to Assumption 
2.2 when the observation is of full rank, and discuss a few concrete examples where the 
observation is rank deficient and Assumption 2.2 still holds. 
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As stated earlier, even when H is of full rank, Assumption 2.2 does not necessarily imply 
Assumption 3.1. Indeed, even though the norms |-| 2 and \H■ | 2 are equivalent, the constants of 
proportionality may preclude the dissipation relation of Assumption 3.1. In a separate work 
[21], the authors have constructed a concrete nonlinear system which satisfies the kinetic 
energy principle, Assumption 2.2, but not the observable energy principle, Assumption 

3.1, and which exhibits catastrophic filter divergence [11, 13] with large probability. This 
indicates the importance of verifying the observable energy criterion over the typical energy 
criterion. 

If the condition number of the matrix H is small enough, a kinetic energy principle 
implies an observable energy principle. To be specific, define the condition number 

Ch '■= max{|iTu||u| : |u| = 1, \Hv\ = 1}, 
which must be finite since H is of full rank. Then we have the following. 

Theorem 4 . 1 . If the kinetic energy principle, Assumption 2.2, holds, then 
\H^ h (u)\ 2 + tr{HR h {u)H T ) < (1 - /3 h )C 2 H \Hu\ 2 + \H\ 2 K h 

where Ch is the condition number of matrix H. In particular, if (1 — fihfCfi < 1, then 
the observable energy criterion, Assumption 3.1, also holds; therefore the average square 
norms of the ensemble members for EnKF and ESQF are bounded uniformly in time as a 
consequence of Theorems 3.2 and 3.3. 

Proof. By Assumption 2.2 

\H^h(u)\ 2 + tr(HR h (u)H T ) < \H\ 2 [\V h (u)\ 2 + ti{R h {u))] 

<(l-/3 h )\H\ 2 \u\ 2 +\H\ 2 K h , 

< (1 — fi h )C 2 H \Hu\ 2 + \H\ 2 K h . 

□ 

Another situation in which a dissipation criterion implies Assumption 3.1 is when the 
dissipation is of a higher polynomial order. This is an immediate consequence of Theorem 

4.1. 

Corollary 4 . 2 . Suppose that the stochastic system U n is generated through the solution of 
the SEE 

du t = fi>{ut)dt + T,dWt, 
as U n = u n h, while for some 5, K > 0 

(u,fi>(u)) < —5\u\ 1+s + K. 

Then the kinetic energy principle, Assumption 2.2, holds with any fixed fih £ (0,1). By 
Theorem 4-1, if rank(H) = d, then the expected square norms of the ensemble members for 
EnKF and ESQF are bounded uniformly in time. 
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C\u t \ 2 = 2 (ut, if{u t )) + tr(SS T ) < —26\u\ 1+5 + 2 K + tr(SS T ). 

By Holder’s inequality, for any a > 0, there is a constant D such that 

£\u t \ 2 < -a\u t \ 2 + D. 

By Gronwall’s inequality and Dynkin’s formula we obtain that 

ElUfi 2 = E\u h \ 2 < e~ ah \u 0 \ 2 + Da- 1 = e~ ah \U 0 \ 2 + Da' 1 . 

So letting a = — ln(l — f3h)h~ l we conclude our proof. □ 

The higher order dissipation described in Corollary 4.2 has been used in many stochastic 
turbulence models to obtain better stability, like the canonical scalar model with cubic 
nonlinearity [28, 29] and the conceptual dynamical model for turbulence in [30]. Moreover, 
this corollary indicates that in the full rank case EnKF can be stabilized if we are willing 
to filter with a model error that stabilizes the system. For example instead of running the 
EnKF with vector held if of Lorenz 63 or 96, we can run EnKF with the altered system 
if = if — \\u\u, with any strictly positive A, then by the above results the filter will have 
bounded observable energy on an infinite time horizon. 

We now address the situation where an energy principle holds for a linearly translated 
version of the typical energy. That is 

E n _i\HU n - u *| 2 < /3\HU n -i -u *| 2 + K 

for some fixed u* G ML Then by Young’s inequality Lemma Appendix A.2 

E n -i\HU n \ 2 < (1 + e)E n _!| HU n ~ u*\ 2 + (1 + e^KI 2 

< (1 + e){P\HU n -1 -u *| 2 + K} + (1 + e~ 1 )|w*| 2 

< (1 + e) 2 /3\HU n -i\ 2 + (1 + e _1 ) 2 (A” + 2|u*| 2 ). (4.1) 

Therefore our framework can be applied to the Lorenz 63 system, Example 2.6, as long as 
iL/diag{r, a, a} satisfies the condition number requirement of Theorem 4.1. 

4-2. Observation with rank deficiency 

In the case where H is rank deficient we do not expect the observable energy criterion to 
hold for broad classes of models. This constraint is in a sense not surprising, since EnKF 
and ESQF with sparse observations can potentially diverge to machine infinity, in a well 
documented phenomena known as catastrophic filter divergence [11, 4, 13]. Nevertheless, 
under certain scenarios one can verify Assumption 3.1 through explicit calculation. We will 
now verify the observable energy criterion for two concrete examples. 
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Example 4.3. Consider a stable linear dynamical system given by U n = AU n _i + Q n , where 
A produces a contraction, 

\Au\ < (1 — fi)\u\ for all u , 

with a e (0,1). For simplicity, we assume that ( n are i.i.d. random variables with mean 0 
and covariance R. Suppose that H commutes with A, AH = HA, then 

E(\HU n \ 2 \U n _ L =u) = \HAu \ 2 + E\H( n \ 2 

= \AHu \ 2 + tr(HRH T ) 

< (1- f3) 2 \Hu\ 2 + tr(HRH T ). 


Hence Assumption 3.1 holds. 

Example 4.4. Recall the Lorenz 63 model, Example 2.6, where U n is given by u n h = 
{x n h, y n h, z n h), where the dynamics of u t is specified by 


^_x t = a(y t -x t ), ^y t = x t (r - z t ) - y t , ^z t = x t y t - bz t . 


Suppose that we have direct noisy observations of the latter two coordinates, that is 
H = diag{ 0,1,1}. We can define the linearly translated observable energy as Sniut) = 
y 2 + (z t — r) 2 . Direct computation yields 


d 

dt 


£H(ut) 


-2 y 2 - 2 bz 2 + 2 brz t < -7 S H (u t ) + r 2 , 


where 7 = min{2, b}. Therefore by Gronwall’s inequality, 


S H {u t+ h) < e lh £ H (u t ) + hr 2 . 


Following our manipulation for linearly translated energy, (4.1), we can conclude that 
Assumption 3.1 holds by taking a sufficiently small e < 1 in the following application of 
Lemma Appendix A.2 

I HU n \~ = y 2 h + z 2 h < (1 + e)£ H (u n h ) + (1 + e 1 )r 2 

< (1 + e)e ^Sh( u( n -i)h) + (1 + e)hr 2 + (1 + e 1 )r 2 

< (1 + e) 2 e lt {y\ n -i) h + zf n _^ h ) + 3(1 + e 1 )r 2 + (1 + e)hr 2 
= (1 + e) 2 e~' rt \HU n -i\ 2 + 3(1 + e^fr 2 + (1 + e)hr 2 . 


5. Geometric ergodicity of the ensemble based Kalman filters 

The objective of this section is to verify geometric ergodicity for the signal-ensemble process. 
In particular, if P denotes the Markov transition kernel for the signal ensemble process, then 
we will show that there exists a constant 7 G (0,1) such that 
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where /i, v are two arbitrary initial probability distributions, C is a time uniform constant 
that depends on n, v, and || ■ ||tv denotes the total variation norm. Furthermore the 
nonlinear filter has a unique invariant attracting state, the analogue of the asymptotic filter 
for linear Kalman filters [31, 4], Hence, geometric ergodicity implies that discrepancies in 
the initialization of the ensemble filters, which is usually inevitable in practice, will dissipate 
exponentially with time. 

To prove the ergodicity of the signal-ensemble process, we invoke a standard result of 
Markov chain theory [18]. Here we use a simple adaptation of the form given in [19, Theorem 
2.3], 

Theorem 5.1. Let X n be a Markov chain in a space E such that 

(i) There is a strong Lyapunov function £ : E i—>■ M + for the Markov process X n 

(ii) For any fixed M > 0, the compact set C — {x : £{x) < M} satisfies the minorization 
assumption. That is, there is a probability measure v with v(C) = 1, and a p > 0 such 
that for any given set A 

P(X n G A\X n _i = x)> rju(A) 

for all x G C. 

Then there is a unique invariant measure n and constants r G (0,1), k > 0 such that 

||P A 1 (X n G •) — t^\\tv < Kr 11 ^1 + J £(x)n(dx) 

In the following we discuss the conditions we require in order to apply Theorem 5.1. In 
the previous sections, we established the existence of Lyapunov functions for signal-ensemble 
processes. In particular, when LI is of full rank, Theorems 3.2, 3.3 and 4.1 provide simple 
criteria which guarantee that the filtering process satisfies the following assumption 

Assumption 5.2 (Existence of a strong Lyapunov function). There is a function £ : 
R d x R dxK —y M + with compact sublevel sets and constants K h ,fih > 0 such that 

E.,-1 £(U n , K (1) ,.... kP 0 ) < (1 - Ph)£(U„- 1, »£>,.... o + K h . 

Although Assumption 5.2 may be difficult to hold in general scenarios, the authors have 
found an adaptive covariance inflation scheme that always guarantees Assumption 5.2, which 
will be reported in a separate paper [22], This assumption provides the first hypothesis of 
Theorem 5.1. 

In order to verify the minorization condition of Theorem 5.1, we need to assume there 
is a density for the noise ( n appearing in the time discrete model ( 2 . 1 ) . 

Assumption 5.3 (Nondegenerate system noise). For any Mi,M 2 > 0, there is a constant 
a > 0 such that 

P(Cn e • | U n —\ =u) > uAm 2 (-) 

for all |w| < Mi, where \M 2 (dx) is the Lebesgue measure ofR d restricted to {u : |w| < M 2 }. 
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Assumption 5.3 holds for many practical examples. When U n is produced by time 
discretization of an SDE (2.2), it suffices to require E being nonsingular, please see Appendix 
B for a detailed discussion. In other situations where U n is produced genuinely as a random 
sequence, Q n is usually a sequence of random Gaussian variables, which Assumption 5.3 also 
satisfies. 


5.1. Controllability 


In this subsection, we establish a framework which verifies the minorization condition using 
Assumption 5.3 and controllability of the Kalman update map. The signal-ensemble process 
X n \= (U n , Vn 1 \ ..., Vn K) ) is a Markov chain taking values in X = R d x R dxK . For all three 
ensemble filters, the evolution of X n is described by the composition of two maps. The first 
is a random map from A to a signal-forecast-observation space y , described by a Markov 
kernel $ : X x B(y) —> [0,1]. The second is a deterministic map T : y —>■ X, which combines 
the forecast with the observed data to produce the updated posterior ensemble. The details 
of these maps, as well as the definition of the intermediate space y, differs between EnKF, 
ETKF and EAKF. 

For EnKF, the intermediate space is y := R d x R dxK x R qxh and the random mapping 


is 


(Un-1 , KS,.... v™) ^ Y n := (t/„, V (1 >,.... q A '>, Z«,.... Zf >) 


r(K) 


The deterministic map T is given by 


r(i4. ■■■, vi K \ z<»,.... zf >) = (u n , r< K >) 


where 


r (k) = y(k) _ c H t^j + HCH T )~\HV {k) - Z (k} ) (5.1) 

6 = j^-^^(v (k) -v)®(v (k) -v) v = . 

k =1 k =1 

The corresponding formulas for ETKF and EAKF will be given in Sections 5.3 and 5.4 
respectively. 

Given this formuation, it suffices to show that the push-forward kernel r*<f>(a;, •) = 
^(a:, T _1 (-)) satisfies the minorization condition. It is easy to see that, given the assumptions 
on the noise, the kernel 4>(a:, •) has a density with respect to Lebesgue measure, so we simply 
need to show that the pushforward inherits the density property from <f>. To achieve this, 
we use the following simple fact. 

Lemma 5.4. Let & be a Markov transition kernel from M n —> M. n x M m with a Lebesgue 
density p(x,y ) = p(x, ( 2 / 1 , 2 / 2 )) and let T : M n x M m —> M n . Given a compact set C, suppose 
that there is a point y* = (yfiy^) £ x M m and (3 > 0 such that 

(i) For all x G C, the density function p(x, y) > (3 for y around y* , 
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(ii) T is C 1 in a neighborhood of y* and det(P yi r)| y * > 0 . 

Then there is a 5 > 0 and a neighborhood 0\ ofT(y*) such that for all x G C 

r$(x,-) ><ja O i (0 

where Ao x is the Lebesgue measure restricted to the set 0\. In other words, the minorization 
condition holds for the transition kernel T*<3>. 

Proof. By continuity and compactness, we can find a small neighborhood O of y*, a 
neighborhood O' of (T(?/*), y%), and a small positive number e > 0 such that for all 
X eC, (yi,y 2 ) g O 

P(x,(yi,y 2 )) > e, e _1 > | det(V yi T(y 1 , y 2 ))\ > e . 

such that T is a C l diffeomorphism from O to O'. Let D 0 = {(yi,y 2 ) : P(yi,y 2 ) G A,y 2 e 
B, (yi,y 2 ) e 0} then, using the change of variables (yi,y 2 ) i-G (T(yi,y 2 ),y 2 ), we have by the 
change of variables formula 

$(x,D 0 ) := I t A (r(y 1 ,y 2 ))l B (yi,y 2 )p(x,yi,y 2 )dy 1 dy 2 

= I l(AxB)(z,y 2 )t 0 '(z,y 2 )q(x, ( z,y 2 ))dzdy 2 

where 

q(x, {z,y 2 )) =pO,t/i, 7 / 2 ) det ^ y ' Z = p(x, y u y 2 )\dei(V yi T{ yi , y 2 ))\~ 1 - 

t^y i y2 tJy 2 y 2 

with z = V(yi,y 2 ). By construction, q is strictly above e 2 for x G C and ( z,y 2 ) G O'. Pick 
neighborhood O i of z* = T(|/*) and 0 2 of y 2 such that 0\ x 0 2 C O'. Then, since the set 
r” 1 (A) fl O is of the form Do (with B = M m ) we can apply the above change of variables 
formula to obtain 

<f>(x,r _1 (A)) >$(i,r“ 1 (d)nO) = J t A (z)to>(z,y 2 )q(x,z,y 2 )dzdy 2 

> / t A (z)q(x,z,y 2 )dzdy 2 
J 0\ XO 2 

> e 2 A(0i)A(0 2 n A) . 

Then taking 5 = e 2 A (0 2 ) satisfies our requirement. □ 

5.2. Ergodicity for the EnKF 

In the application of Lemma 5.4 to the signal-ensemble process, we will use the variables 


x = (U n - U v?\,... 
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yi = ([/■„, pm,..., i/f>) 

The choice of the intermediate point (yfiy^) can be quite delicate. Although the non 
degeneracy of the Jacobian should in principal be verifiable for general T, a well chosen 
intermediate point can simplify the computation significantly. 

With Theorem 5.1 and Lemma 5.4, the verification of EnKF becomes rather straight 
forward. 

Theorem 5.5. If the unfiltered signal process U n has an kinetic energy principle with 
nondegenerate system noise, and the EnKF signal-ensemble process has a strong Lyapunov 
function, in other words Assumptions 2.2, 5.3 and 5.2 hold, then the EnKF signal-ensemble 
process is geometrically ergodic in total variation distance. 


Proof. Fix any M\ > 0, we apply Lemma 5.4 to the compact set 

C= |(u,u ( 1 ) ,...,u w ) : \u\ 2 + J2\v {k) \ 2 < AlX 

Pick the intermediate point y* with all its components at the origin. It is easy to see that 
the first condition of Lemma 5.4 holds. Indeed the random variable (n\f n , £n^) satisfies 
the density condition by assumption and (y \, yf) is obtained from this random variable via 
an onto linear transformation, hence ( 2 / 1 , 2 / 2 ) inherits the density condition. 

It is also elementary to verify the differentiability and nondegeneracy of T at y *, 
where T is defined by (5.1). Indeed, using the formula for gradients of inverse matrices 
T>L~ l = —L~ 1 'DLL~ 1 , it is clear that T is a polynomial combination of several continuously 
differentiable functions and in particular must be C 1 near y*. To prove non-degeneracy, 
notice that both C and 'D yi C vanish at y*. Using this fact, a simple calculation yields 

v ri * = i 

^yi L I y 1 j 


which proves non-degeneracy and hence the Theorem follows from Lemma 5.4 and Theorem 


5.3. Ergodicity of ETKF 

For both ETKF and EAKF, the intermediate space is slightly different since there are no 
longer perturbed observations. In particular we have y : = x R dxK x IP and the Markov 
kernel $ : X x Biy) —» [0,1] is described by 

(V.-1, W,, .... vS) ^ (fpm, .... v,[ K \ z n ). 

The deterministic step is given by the map T(f/, V, Z ) = (U, T^ 1 ),.... 5 T^U) where 

r (fe) = v - ch t (i + hch t )-\hv -z) + s (k) 


(5.2) 
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where V = A 1 an d ^ = k~[ — V) ® (R^P — V) and S^ is the fc-th 

column of the updated spread matrix S = ST(S) where S is the forecast spread matrix 
S = (V^ 1 ) — V,..., — V) and T(S) is the transform matrix. We have not yet defined the 

transform matrix T(S), other than to require that it yields the covariance condition (2.9). 
One reasonable choice for the transform matrix, which we will adopt, is the matrix square 
root 

T(S) = (I K + (K - 1 )~ 1 S t H t HS)~ 1/2 = (/,.■ - (K - 1 )~ 1 S T H T (I + HCH t )-‘HS t ) 1/2 . 

(5.3) 

Note that the square root is well defined and unique since the argument is symmetric and 
positive semi-definite. We will now apply Lemma 5.4 with 

x = {U n - 1 ,v£! ij ... 1 V™) yi = {U n ,vW,...,vW) y- 2 = Z n . 

and the intermediate point y* = (yfi yf) = ( 0 , 0 ). 

Theorem 5.6. If the unfiltered signal process U n has an kinetic energy principle with 
nondegenerate system noise, and the ETKF signal-ensemble process has a strong Lyapunov 
function, in other words Assumptions 2.2, 5.3 and 5.2 hold, then the ETKF signal-ensemble 
process is geometrically ergodic. 

Proof. Fix any number Mi > 0 and define the compact set 

C = |(n,n (1) ,...,n (fc) ) : \u\ 2 + ^ |u (fc) | 2 < M 1 

As with Theorem 5.5, we pick the intermediate point y* to be the origin. Showing that <f> 
satisfies the first condition of Lemma 5.4 is identical to Theorem 5.5, hence it suffices to 
show differentiability and non-degeneracy. 

By Lemma Appendix D.l, the transform matrix T(S) is continuously differentiable and 
hence it follows trivially that T is C 1 near y*. For the non-degeneracy condition, we have 

that _ _ 

V v y k) = V yi V - V yi {CH t (I + HdH T )-\HV - Z)) + V yi S (k) . (5.4) 

Precisely as in Theorem 5.5, the second term on the right hand side vanishes at y*. For the 
third term, we have by Leibniz rule 

V yi S = V yi ST(S) + SV yi T(S). 

But by the definition of T(S), it is clear that V yi (T(S)T(S)) vanishes as y* and that T(S) = I 
at y*. Hence we have 

0 = V n (T(S)T(S))\,- =2V n T(S)\ s . , 

thus V yi S\ y * = D yi S\ y *. Returning to (5.4), we see that 
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It follows that T> yi T\ y * = / and as in Theorem 5.5, this completes the proof. 

□ 

Remark 5.7. ETKF has different formulations to (5.3), where T(S) is obtained by 
multiplying the original formula by a rotation matrix on the right. The same principles 
apply to these ETKF, as long as the rotation matrix is differentiable around the intermediate 
point. However, the origin may not be good choice of intermediate point, since the rotation 
map can be singular to perturbations around the origin. In this case, one must use the 
methods in the following section for EAKF, where we deal with the same issue. 

5.4-. Ergodicity of EAKF 

In this section, we apply the same strategy as EnKF and ETKF to obtain geometric 
ergodicity for EAKF. For EAKF, the intermediate space y and the Markov kernel <f> are 
identical to those of ETKF. The deterministic map T is still defined by (5.2), but now the 
spread matrix S is defined by S = A(S)S where A(S') is an adjustment matrix. As with 
ETKF, the adjustment matrix can be any matrix that ensures the covariance condition (2.9). 
In the next section, we will discuss how to construct such an adjustment matrix. 

5 . 4 .I. Detailed formulation of EAKF We adopt the formulation of EAKF from [3, 4], 
described by the following steps. 

(i) Compute the SVD decomposition of S, denoted by S = QAR. When there is rank 
deficiency, i.e. A is not square or not invertible, we can further decompose Q and R into 
parts corresponding to null and complementary subspaces: 

s = [Qi Q2I ^ ° p 1 = Q1MR1 ( 5 . 5 ) 

J U U it 2 

where Ai is a square diagonal invertible k x k matrix with k < min(d, K). Without loss 
of generality, we assume Ai has its diagonal entries descending. 

(ii) Let G T DG be the eigenvalue decomposition of (K — l)~ l A T Q T H 1 HQ A where D is 
positive semi-definite with decreasing diagonal entries. As in the first step, in the rank 
deficient case we can write 

G^DiGi = (K - l) _1 A 1 QfR T IL(5 1 A 1 
where G\,Di are kxk matrices. 

(iii) In the general (rank deficient) case, the assimilation matrix is given by 

A(S) = Q 1 A 1 G 1 (/ + Dj-^Af'Ql (5.6) 

or equivalently updating the spread matrix to be 


s = q 1 a 1 gI(i + d 1 )~ 1 Rr 1 . 
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In the full rank case this reduces to the more well known formulation 

A(S) = QAG T (I + D)- l ' 2 A ] Q T . (5.7) 

In existing EAKF literature, the rank deficient case is rarely examined except as a 
footnote in [32], However, rank deficiency is unavoidable when the ensemble size K is less 
than the model dimension d. As a matter of fact, in these degenerate scenarios, the choice 
of the eigen-basis is very subtle, and simply following the classical formulation (5.7) with 
different choice of G could end up with inaccurate posterior covariance. The assimilation 
matrix formulation (5.6) we used here guarantees that the posterior covariance follows the 
Kalman update relation (2.7). In practice, direct application of (5.7) in most mathematical 
programs with default settings will generate the same update rule. A more detailed discussion 
of these issues are presented in Appendix C. 

Since the above EAKF formulation relies on a singular value decomposition, the choice 
of singular vectors and hence the map T has a non-unique definition. To show geometric 
ergodicity, we rely on differentiability properties of this map T. Hence we must rely on a 
rigid definition of the map T involving a fixed choice of singular value decomposition. From 
here on, any function T that fits into the above formulation will be called an EAKF update 
map. The following theorem, which is the main result of the section states that there exists 
a choice of EAKF update map that renders the EAKF process geometrically ergodic. Due 
to the complexity in defining T, statements concerning the ergodicity of an arbitrary EAKF 
algorithm seem out of reach. 

Theorem 5.8. There exists an EAKF update mapping T such that if the signal-ensemble 
process generated by it has a strong Lyapunov function, and the unfiltered signal process U n 
has an kinetic energy principle with nondegenerate system noise, in other words Assumptions 
2.2, 5.2 and 5.3 hold, then the EAKF signal-ensemble process is geometrically ergodic in total 
variation distance. 

5.f.2. The choice of intermediate point Unlike the ETKF case, we cannot pick the 
intermediate point to be the origin. This is because at the origin, the spectrum of S T H T HS 
clusters at 0 and a perturbation of S may split the eigenvalues into branches while leaving 
the eigenvector basis matrices G and R to be singular [33]. We will instead choose an 
intermediate point that has simple nonzero spectrum. 

The intermediate point y* = (y* Xl yf) will be defined using a matrix Mo constructed 
below. Precisely, we choose 

Vi = {IT, V* {1 \ ..., V*W) = (0, M 0 (1) ,..., M 0 W ) y* 2 = Z* = 0 , (5.8) 

where M x f } denotes the j-th column of Mo. By construction of Mo, this choice will satisfy 
V* = 0 and hence S* = Mo and moreover will attain the highest possible rank for such a 
matrix. As we shall see, these properties ensure that the EAKF map is (locally) well behaved 
under perturbations. 
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In the sequel, we assume that the observation matrix H is diagonal with rank q < d. 
Using the change of coordinates described in Remark 2.1 as well as the fact that Assumptions 
5.2, 5.3 and the statement of geometric ergodicity hold equivalently in both coordinate 
systems, this can be achieved without loss of generality. 

Lemma 5.9. If M is a d x K matrix such that Ml = 0 then the rank of M is at most 
r := ruin {A" — 1 ,d}. Moreover, if we define the d x K matrix M 0 by 

oj if d < K — 1 
if d>K-l 

where 1 is the d x 1 vector of 1 ’s and S 0 is the r x r matrix 

" 1 1 ... 

So= 1 1-3 

1-2 0 
-10 0 

then A4 q has the following properties 

(i) Mq 1 = 0 , rank(Mo) = r and all nonzero eigenvalues of Mo are simple. 

(ii) Mo has the SVD Ad 0 = QqA 0 Ro where Qo is the d x d identity matrix and the last row 
of R 0 is 1 T . 

(Hi) When H is a diagonal matrix with descending diagonal entries, (K—l) -1 A 0 Qf H T HQ 0 Ao 
has eigen-decomposition GfD 0 G 0 , where D 0 has descending diagonal entries and is of 
rank th '■= min(g, K — 1), and Go is the r x r identity matrix. 

Proof. It is elementary to see that rank(M) < d. Also, since M has the same rank as Ad T M, 
while M t M has 1 as a null right vector, so rank(M) < K — 1. For the claims for M 0 , direct 
verification will be sufficient, where the orthogonality between different rows of M 0 easily 
leads to 

M 0 Mq = diag{r(r + 1),..., 2 x 3,1 x 2, 0 ...} 

Clearly their spectrums are as requested. The claims for Go and Do easily follow from direct 
verification. □ 

5.f.3. Differentiability of EAKF near the intermediate point The intermediate point y* was 
chosen to ensure stability of the eigenvalues of S T S and A±Qf H 7 HAiQi as a function of y, 
in a neighborhood of the intermediate point. We will now demonstrate this. 
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Lemma 5.10. We can construct an EAKF update map T which is continuously differentiable 
in a neighborhood of the intermediate point y*. 

Proof. Recall that T is defined by T = ( U , , V^) where 

v (k) = v + s {k) 

for each k — 1,..., K where is the k -th column of S and 

S = Q^iGKl + V — V — CH t (I + H T CH)~ l {HV — Z\ . 

The only unspecified parts of the map are the matrices Q, A, R and G (and thus Qi, A 1? R x 
and G'i), hence these must be constructed. It suffices to show that all terms appearing in 
the above map, including the constructed terms G\,R\, are continuously differentiable as a 
function of y = (U, ..., V^ K \ Z ) in some neighborhood of y*. 

Clearly V, S and S T S depend smoothly on y, differentiability of V is shown in the proof 
of Theorem 5.5. The latter two can be directly seen from the definitions of the mean and 
spread update maps in (2.8) and (5.7). 

We claim that there is a C l extension of Q , R as Q(y),R(y) in a neighborhood of y = y* 
such that 

(i) Q(y) and R(y) are the eigenbasis matrices in the decomposition of S(y)S T (y) and 
S T (y)S(y) respectively. 

(ii) There is a diagonal matrix A (y) such that S(y) = Q(y)A(y)R T (y). 

To verify the first claim, note that all r nonzero eigenvalues of S T (y)S(y) are by construction 
simple at y*, hence they are smooth with respect to y by II.2.2 of [33], and in particular stay 
positive, distinct and maintain the same descending order for y near y*. Moreover, because 
the rank of S T (y)S(y) is at most r, 0 will be an eigenvalue of multiplicity K — r for y near 
y*. By Lemma Appendix D.2, there is a transformation matrix U(y) that transforms the 
simple nonzero eigenvectors and null space of S T (y*)S(y*) to the ones of S T (y)S(y). So 
R(y) ■= RoU(y) T provides an eigenbasis matrix for S(y) T S(y) for y near y*. Likewise we 
can also find a Q(y) as the eigenbasis matrix for S(y)S T (y) for y near y*. 

We will now check the second point of the claim. If there is another SVD, S(y) = QAR 
at y close to y* with A having descending eigenvalues, then R has its first r rows being the 
eigenvectors of S(y) T S(y) associated with descending nonzero eigenvalues, and the remaining 
K — r rows correspond to the basis of the null space. Therefore, R(y) has the first r rows 
being either identical to the ones of R or their additive inverse. Likewise we have the 
same conclusion for Q(y). Then because the choice of eigenvectors for the null space are 
unimportant, which can be told from the fact that Q 2 ,R 2 play no role in (5.5), we have 


A(») := Q(y) T S(y)R(y) T 
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is diagonal, and each component is either the same as A or the additive inverse. Yet A (y) is 
continuous with respect to y, so for there exists a neighborhood of y* so that they remain 
nonnegative, this indicates A (y) = A and completes the proof of the two claims. 

Similarly, we can find a C 1 function G\(y) defined in a neighborhood of y* such that 
Gi(y*) = I, and such that Gfiy) is an eigen-basis matrix for Ai(y)Qi(y) T H T HQi(y)Ai(y) 
with corresponding diagonal matrix Dfiy), as in part (ii) of the EAKF formulation. 

Finally, the function (/ + Df) 1 ^ 2 is always C 1 in y. To see this, note that 

D\ = (K - 1)~ 1 G 1 A 1 Q^H t HQ 1 A 1 G'[, 

which is C 1 in y. But since D\ is diagonal with entries nonnegative, it follows that (I+D\)~ l C 
must also be C 1 in y. Hence all terms involved in T are C 1 in a neighborhood of y* and the 
proof is complete. 

□ 


5 . 4 . 4 . Controllability of EAKF at the intermediate point 

Lemma 5.11. The EAKF update map T constructed in Lemma 5.10 has its Jacobian at y* 
being non-degenerate. 

Proof. Before proceeding, we recall some notation. Let S = S(y) be the spread matrix 
constructed from y, similarly let Qi,Ai,Ri,Gi,D,Di be the matrix valued functions of y 
that were constructed for the purposes of the EAKF map in Lemma 5.10. In the proof below, 
we will frequently use the subscript 0 to denote a matrix valued function being evaluated at 
the intermediate point y = y*. 

Recall from Lemma 5.9 that the SVD of S(y*) = M 0 is given by 

S(y*) = I d A 0 R 0 = I d 

Since the last row of i ? 0 ,2 is 1 by Lemma 5.9, we can also write the SVD decomposition as 


1 

> 

0 

0 


So 

O 

_1 

0 0 


P().2 


-fdA 0 i? 0 — I d 


A 5 O' 


~Rs 

0 0 


1 T 


5 


where Rs is the first K — 1 rows of R 0 , and A 5 is the upper (K — 1) x (K — 1) sub-block of 
A 0 . We are only interested in the derivatives of T in the directions ij\ = [U, ..., 

these directions clearly form a d(K + 1) dimensional subspace. We can always write this 
subspace as 

M := {[ u,v®I k + BR s } : u,v E R d ,B E M dx(A '- 1} }. 

Indeed, it suffices to see that any spread matrix S can be decomposed S = BRs for some 
suitable B. 
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To prove the lemma, suppose that there exists a direction M e M. such that the 
derivative of [U, V^\ ..., V^} in the direction M vanishes at y*. We will show that M 
must be zero. Write M — [u,v® 1 k + BR S \. If the derivative vanishes at y*, we must also 
have 

T>mU = T>m V = 0, V M S = 0. 

at y*. Through the following steps, we will show that M must be zero. In steps 1 — 3 we 
show u = v = 0 and in steps 4) — 9) we show that B = 0. 

Step 1) u — 0: This is a direct consequence of V M U = u. 

Step 2) If C — SS T then V M C = 0 at y*: Note that since V M S = 0 at y*, 

v M [c - ch t (h t ch + iy'Hd] = v M {ss T ) = o, 

where C = SS T is clearly differentiable. Using the formula of the derivative of an 
inverse, we find the previous equation is equivalent to 

0 =V M C - V m CH t (H t CH + I^HC - CH t (H t CH + i)- 1 hv m c 
+ ch t (h t ch + i)- 1 h t v m ch(h t ch + iy'Hd. ( 5 . 9 ) 

Now let Cij denote the entries of T>mC and let A*, h z denote the diagonal entries of 
the diagonal matrices A 0 and H respectively. Since the right hand side of (5.9) has 
all matrices being diagonal except T>mC, we can compute it explicitly, and find the 
ij- th entry is 

(A^+ir^+irv 

Therefore (5.9) implies that V M C — 0 at y*. 

Step 3) v = 0: Since T>mV = 0 and V is given by 

V = V - CH t (I + iFCHy^HV - Z] 

so using the fact that T>mC = 0 at y*, 

V M V = V M V - CH t (I + H t CH)- 1 [HV m V] 

which at y* can be computed explicitly because the matrices are diagonal: 

{v M v)i = (i + hfyy^iVMVfi => o = d m u = v. 

Step 4) XjBij + A iBji = 0 and the diagonal terms of B are zero: From here and after, the 
range of index (i,j) is in {1,».., d} x {1,..., K « 1}. Our claim comes from the fact 
that C = SS T and T>mS = BR$ so at y = y* we have 

0 = V M d = (V M S)S T (y*)+S(y*)(V M S) T = BR 0 S T (y*)+S(y*)R%B T = BA^+A 0 B T . 

Then notice that B is d x (/l — 1) dimensional, so its diagonal terms correspond to 
the first r = min{(i, K — 1} nonzero entries of A*, therefore they are zero. 
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Step 5) T>mQi = 0,V m Ai = 0 at y*: Because Q\ is formed by eigenvectors of matrix 
C associated with the nonzero simple eigenvalues, which are the entries of Ai, by 
Lemma Appendix D.2 and formula (D.3) we have our claim because T>mC = 0. 

Step 6) [DM(Ri)Rs]ij = (A iBij + A jBjf )/(Af — A 2 ) at y* for i j: by Lemma Appendix D.2, 
if we denote r* to be the i-th row of _R 0 ,i, while T; being eJYj, then because Rf 2 Ro,2 
is the projection to the null space of S T S at y*, 


v M ri = nV M {S T S) 


A 


-2 d T 


R 0 ; 2 Ro,2 


+ ^(A* _ ^ 

k^i 


ir^iVk 


So for 1 < j < r, because = 1 k=j r J, RoyRo^J = 0) we find that 

[D M (Ri)R's]i j = V M { ri )rj = (A, 2 - A *)- 1 r i V M {§ r S)rJ. 

For r < j < K — 1, because x \’ j. 'B/, / J = 0, Rf :2 Ro,2 r J = rj, we find that 

[V^RfiR^ = D,,(/•;)/•]' = A f ri V M (S T S)rj. 

Then using r t Rfi = e*, the row with zero component except being 1 on i-th 
coordinate, 


r/Di\ 


= r l {R 1 s B 1 X,Rs + R 1 s N 1 ,BR s )r 1 3 =X j e l B^e j 


T " T + XeiBei 


T 

j ’ 


which concludes our claim. 

Step 7) V m Gi = 0, T>mD\ = 0 at y*: Because GfD\G\ is the eigenvalue decomposition of 
(K — iy 1 AiGf H T HGiAi, due to Step 5), the derivative in the direction M must 
vanish at y*, so by Lemma Appendix D.2 we have our claim. 

Step 8) T>mD = 0 and T>mDi = 0 at y*: recall that G T DG is the eigen-decomposition of 
SH 1 HS and D\ is the upper r x r block of D. Notice that D has its entries being 
the eigenvalues of S T H T HS, which is the same as H T SS T H = H T CH, but from 
Step 2) T>mC = 0 at y*, so we have our claim. 

Step 9) B = 0: By step 4), we only need to verify non-diagonal entries. From T>mS = 0, 
S = QiAiGf(I + Dfi~ 1 X 2 R 1 and results from Steps 4), 6), 7), we have 

0 = Q^G^I + D 1 )~R 2 V m R 1 


which by QiAi = Ai which is invertible at y* and G\ = I r at y* leads to 

0=(I + D 1 )- 1 ^ 2 (V m R 1 )R^ 

at y*. Because (/ + Lb) -1 / 2 is a diagonal matrix with entries (1 + A 2 /i 2 ) -1 / 2 , using 
the results from Step 6), the (i,j)-th entry of the right hand side is 

A iBij + A jBji 
\/l + Afft|(Af - A 2 ) 



Nonlinear stability and ergodicity of ensemble based Kalman filters 


30 


so A iBij + A jBji = 0 for i j. We claim that A* 7 ^ A j for i j, this is because 
A i = A j only when they are both zeros, yet either i < d = r or j < K — 1 = r, and 
hence A* or A j is not zero. Recall that from Step 4) we have that A jBij + A iB^ = 0, 
combining it with our results that A iBij + XjBji = 0 and A,; 7 ^ A j, this can only be 
possible when B l} = 0. 

Therefore, the Jacobian of mapping T can not be degenerate at y*. □ 

We now have the ingredients required to prove our Theorem 5.8 

Proof of Theorem 5.8. In light of Lemmas 5.10 and 5.11, the result follows identically to the 
proof of Theorem 5.6. □ 

6. Conclusion and Discusion 

In the preceding pages, we have established a simple analytic framework for validating 
two important nonlinear stability properties of ensemble based Kalman Liters, namely 
boundedness and geometric ergodicity for the signal-ensemble process. These are important 
properties in practice, as they guarantee the Liter processes will remain bounded on an 
inbnite time horizon and that initialization errors in the algorithm dissipate exponentially 
quickly in time. 

In Section 3, upper bounds for the signal-ensemble processes are obtained via a simple 
Lyapnuov function argument. In particular we show that, if the signal satisLes the observable 
energy criterion , introduced in Assumption 3.1, then one can construct a Lyapnuov function 
for the signal-ensemble process. The sub-level sets for this Lyapnuov function are only 
compact in the observed directions. Hence this can be thought of as an upper bound for 
the observable ensemble {HV^}^ =1 . This Lyapnuov function argument is used to construct 
upper bounds for EnKF, ETKF and EAKF. 

Section 4 discusses the applicability of the observable energy criterion. Heuristically 
speaking, systems with strong dissipation in kinetic energy and complete observations as 
well as suitable spatial observations will satisfy the observable energy criterion, therefore 
their EnKF and ESQF ensemble will be bounded uniformly in time. On the other hand, 
systems without observable energy criterion have the potential to diverge to machine inhnity 
in Lnite time, which is known as catastrophic Liter divergence. This phenomenon has been 
captured in previous numerical experiments [11, 4, 13]. In a separate article, the authors 
have constructed a concrete nonlinear system with kinetic energy dissipation but without 
observable energy dissipation, whose EnKF ensemble diverges to inhnity exponentially fast 
with large probability [21], 

In Section 5, geometric ergodicity is established for the signal-ensemble processes of 
EnKF, ETKF and EAKF. This is achieved by combining the existence of a Lyapnunov 
function (with compact sub-level sets) with a minorization condition. The existence of 
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a Lyapnuov function can be guaranteed using the results from Section 5, but any other 
Lyapnuov function would suffice. 

While our results have shown that systems with dissipative observable energy will 
produce stable filter processes, it is hard to believe that Assumption 3.1 is necessary for 
stability. As discussed earlier, numerical evidence suggests that ensemble based Kalman 
filters are typically very stable and catastrophic filter divergence only happens when the 
observations are sparse. The stability and ergodicity of the filter processes here are inherited 
from the original nonlinear system. This type of inheritance phenomenon generally exists 
for many filter processes, for example Kalman filter preserves linear stability, and optimal 
filters preserves absolute regularity of general Markov processes [31, 34, 15, 16]. In order to 
extend our results, one might seek new dynamical properties that can be inherited through 
the Kalman assimilation step. 

In obtaining our results, we use few properties of the forecast covariance matrix other 
than positivity. On the one hand this lends generality to our results, in that the upper 
bounds hold for a broad class of ensemble methods, including covariance inflation schemes. 
On the other hand, if we had more control over the covariance one might hope to gain more 
control over the filter ensemble. Thus, it is quite natural to ask whether one can “inflate” 
the covariance adaptively in order to guarantee stability properties, even in the case where 
the observation H is of low rank. This question will be investigated by the authors in a 
subsequent paper [22], Similar nonlinear stability and ergodicity results as developed here 
are also valid for finite ensemble Kalman filters which are continuous in time [17] and will 
be reported elsewhere. 
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Appendix A. Elementary claims 

Lemma Appendix A.l. Let A be a positive semidefinite symmetric matrix, then the 
following holds: 

0 N A(A +1)- 1 N I, 0N(A + I)~ 1 NI, 0 A (A +1)- 1 ^ A- 1 , 

where A A B means that B — A is positive semidefinite. 

Proof. Since A(A + J) _1 + (A + J) _1 = /, it suffices to show 0 A (A + J) _1 ■< I. Since A 
is positive semidefinite and symmetric, it can be diagonalized through an orthogonal matrix 
T, i.e. A = TUT 7 with D being diagonal. Then based on (A + J) _1 = T (D + J) _1 T t , it is 
elementary to conclude our lemma. □ 
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Lemma Appendix A.2. By Young’s inequality, for any e > 0 and x, y G the following 
holds: 

\x + y\ 2 = M 2 + \y\ 2 + 2 (x,y) < (1 + e 2 )|a;| 2 + (1 + e~ 2 )\y\ 2 . 

Lemma Appendix A.3. For any two dxd positive semidefinite symmetric matrices A and 
B, the following holds: 

tr(A(A + B)- 1 ) = tr((A + B)~ l A ) < tr((A + B)~\A + B)) = d. 


Lemma Appendix A.4. 


5 = S[I K + (K - 1)~ 1 S t H t HS]~ 1 ^ 2 Q(F)[SS t ]~ 1 ^ 2 S 


Proof. Let N be a matrix with its columns form an orthogonal basis of the null space of S, 
i.e. 

N = [ei,... ,e u ], SN = 0, n + rank(F) = d. 

Then we can find F and X such that 


SS T 


[N,F] 


|"o 0 


1 

s 

_1 

1 

o 

M 


1 

_1 


I K + (K - ly^^HS = [N, F] 


|"o 0 


1 

_1 

1 

o 

M 


1 

CtH 

_1 


□ 


Appendix B. Discrete time formulation 

In many applications, the underlying dynamical system is given by an ODE or SDE, which 
in general can be written as 

du t = i>{u t ) + YdW t . 

However, the noisy observations of these systems are usually made not in continuous time, 
but rather sequentially in time, with time interval h. Therefore it is natural to see the 
stochastic process instead as a stochastic sequence U n = u n h as we do in Section 2.1. 
Theoretically speaking, we can always transform the SDE for ut into the nonlinear update 
map of U n as in (2.1), since if we write the transition kernel of process u t from time 0 to 
time h as Kh(u, dv), then it suffices to let 

^ h(u ) = E(u h \u 0 = U) = J vK h (u, dv), Cn = U nh - ^ h (U(n-l)h)- 

The reader should notice that the introduction of the nonlinear map T/, and random sequence 
( n is just for the convenience of our formal illustration and rigorous proof. In order to do 
the forecast step of EnKF or ESQF in practice, it is not necessary to find the concrete 
formulation of dT; it suffices to simulate the SDE starting from each posterior ensemble 
from time 0 to h, and let the forecast ensemble V r i k> be the realization of the simulation 
at time h. 
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It is also easy to obtain an energy principle, Assumption 2 . 2 , for the discrete time version 
U n = as long as the original SDE satisfies that (u, ijj(u)) < — 7 |w | 2 + k for some 7 , k > 0. 
Because then the generator of the square norm \u t \ 2 would satisfy 

£\u t \ 2 = 2 (u t , if(u t )) + tr(EE T ) < - 27 |w t | 2 + 2 k + tr(EE T ), 

which by Gronwall’s inequality and Dynkin’s formula yields 

r h 

E\u h \ 2 < e~ 2lh \uu\ 2 + / e-^ {h - s) (2k + tr(SS T ))ds 

Jo 

< e~ 2lh \u 0 \ 2 + h(2k + tr(EE T )). 

Moreover, when the stochastic covariance matrix E is nonsingular, by [35] the transition 
kernel Kh(u, dv) is equivalent to the Lebesgue measure on also by Theorem 38.16 of [36], 
this density is smooth both in u and v. As a consequence, the nondegnerate system noise 
condition, Assumption 5.3, holds because {(u,v) : |u| < Mi,\v\ < M 2 } is compact. More 
generally, Assumption 5.3 should hold for discrete time formulation of hypoelliptic systems 
where certain controllability conditions are satisfied, please refer to [19] and the reference 
therein. 


Appendix C. EAKF with rank deficiency 


In this section, we demonstrate that, when there is rank deficiency, in order to achieve 
the posterior covariance relation (2.7) for EAKF, it is necessary to use specific eigen- 
decompositions in the formulation of EAKF. In particular, certain choices of eigen- 
decomposition will result in a violation of (2.7). We will also show that the formulation 
employed in Section 5.4.1 does always satisfy (2.7). 

The following example illustrates that not all eigen-decompoisiton will satisfy (2.7). Let 


S 


1 

0 


-1 

0 


H 


0 0 
0 0 


then the SVD of S is given by 


5 = QAR = 


1 0 
0 1 


y/2 0 

0 0 


1 -1 

V2 V2 
1 1 
V2 s/2. 


Notice that AQ T H T HQ A = 0, so G can be chosen as any orthonormal matrix. One possible 
choice of matrix G is R 1 and D — 0. However, the spread update with G = R T following 
the classical formulation is 


S 


QAG T (I + D)~ 1 / 2 A^Q T S = 



s/2 

0 



-1 

0 


1 -1 

s/2 s/2 

0 0 
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This posterior spread S does not match the posterior covariance relation (2.7), which requires 
that SS 1 = SS T since H = 0 indicates there is no observation. It is necessary for us to 
choose G = I 2 here, which produces the right posterior covariance, 


S 


QAG T (I + D)~ 1/2 A^Q T S = 


1 

i 

o 

I 2 

o ( 

I 2 

1 -1 


1 -1 

i 

O 

0 


0 0 


1 

o 

o 


1 

o 

o 


In general, as long as G 2 Rf = R 2 Gf = 0 holds in the formulation of EAKF, then the 
posterior covariance relation (2.7) holds with formulation (5.6). One can see that from 


SS T = QiAiGi(I + D^CnA.Ql 


QiAfil + A 1 Q^H t HQ 1 A 1 ]~ 1 A 1 Q^ 


which satisfies our need because of the following algebraic identity with A = Q\A\ 


[I + (K - 1 )~ 1 A t H t HA}~ 1 = I- (K - 1 )~ l A T H T (I + HAA t H t )~ 1 HA. 


On the practical side, in mathematical computing packages, directly applying the 
classical EAKF formulation (5.7) will produce the same result as the formula we used (5.6). 
To see this, we note that 


a t q t h t hqa 


A, 0 


Qf 

h t h 

Qi Q 2 

Ai 0 


~A 1 QjH T HQ 1 Ai 0 " 

0 0 


Ql 



0 0 


0 0 


In the default setting of most mathematical programs, the eigenvalue decomposition will 
arrange the eigenvalues in decreasing order, and clue to the block structure, producing the 
orthonormal matrix G and eigenvalue matrix D to be 


G = 

G\ 0 

, D = 

D\ 

0 

O 

1 

1 


0 

0 


Then S is given by the following, 


QAG t (I + D)~ 1 > 2 A ] Q T S = [Qi Ai 


0] 


0 


O 

1 —1 



(I + Dfi - 1 / 2 
0 


1 

O 


A 7 1 0 


O 

_1 

I 


1 

0 

1_ 


Ql 


QiAiRi, 


which is the same as QiAiGj(I + Dfi) l RRi. 


Appendix D. Perturbation theory for matrices 

In [33], the perturbation theory for matrices are carefully studied. Here we collect some 
useful results for our study of the controllability of ESRF, where all the section numbers and 
page numbers mentioned below are referring to [33] if not otherwise specified. One thing 
the reader must be careful is that most results in [33] are for univariate perturbation, while 
our interest lies in multivariate perturbation. So we generally follow the strategy of Section 
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II.5.7, that is trying to express matrices through contour integral of the resolvent, and restrict 
to cases with stable spectrum structure when eigenprojections are involved (hence why we 
pick a specific F* in Lemma 5.9). 

The most fundamental tool used in [33] is the resolvent of a symmetric positive semi- 
definite matrix C, given in Section 1.5.2 by 

R(c,c) = (c-cr 1 . 

Clearly R((, C ) is well defined at any ( not inside the spectrum of C. Moreover if C depends 
C k upon on a multivariate F, using the derivative formula for matrix inversion, R(C, C) 
depends C k upon F as well. The resolvent can be used in the Dunford-Taylor integral 
to give simple expression for functions of matrix C | see section 1.5.6. For example in the 
computation of ETKF, we need to compute (/ + C)~ 1//2 , which can be expressed as 

(/ + C)- 1 ' 2 = / (1 + 0~ 1/2 R((, CMC 

Here T is a closed loop in the positive half of the complex plain that encloses all eigenvalues 
of C. Then by the differentiability of R((C), it is very easy to see the following lemma 

Lemma Appendix D.l. Let C(F) be a symmetric semi positive definite matrix that 
depends C k upon F e M", then (/ + C(F))^ 1 / 2 exists and depends C k upon F. 

When both C and its perturbation are symmetric, the projection to the eigenspace of 
a particular eigenvalue A, which is called the eigenprojection (Section 1.5.3), can be defined 
through a contour integral of the resolvent using Cauchy’s integral formula: 

p ^-h.l R ^ C)d ^ 

where is a path in the complex half-plane {z G C : Re(z) > —1} that encloses A 
but not any other eigenvalues. As a reminder, when the eigenvalue A splits into different 
branches after perturbation, the perturbed eigenprojection P\(F) will be the sum of the 
eigenprojections of all branches, and therefore it is also known as the total projection of 
the A-group (Section II.2.1). This is a slightly nasty case and we try to avoid it by picking 
a point where the spectrum is stable. In particular, if A is a simple eigenvalue, P\ is the 
orthogonal projection to the eigenvector, and the splitting singularity does not exist for A. 
Because R((, C) is smooth or C k w.r.t. F and the eigenvalues are continuous with respect to 
perturbations, the eigenprojection P\ is smooth or C k w.r.t. F. In particular, the directional 
derivative in a direction / can be computed as 

V,P X = P x (V,C)iY ^(A - f))- 1 / 5 ,,] + - ri)-'P x ](V,C)P x . (D.l) 

v¥=* ri^X 

Here the summation of r) is over all spectrum of C that is not A. (See page 88 equation 
(2.14) because all the eigenvalues are semi simple when C is symmetric, there S is given by 
(5.28) and (5.32) on page 42). 
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The perturbation to the eigenvectors is studied through transformation functions 
(Section ff.4.2). Consider a perturbation C(x) parametrized by a scalar x such that 
C — (7(0), then the transformation matrix is defined through an ODE: 

U(x) = P x (x)P x (x)U(x), U (0) = I d . (D.2) 

A 

One can similarly define U^ 1 (x) and consequently show that U(x)P\(x)U~ 1 (x) = P\(0) 
for all eigenvalues A of C. Moreover, when C and M are symmetric, U(x) is actually an 
orthogonal matrix (Section II.6.2), so if no eigenvalues splitting occur after the perturbation, 
then U(x) actually provides a transformation of the eigenvectors of C to the ones of C(x). 
Therefore, for symmetric perturbation C(F) small enough, we can define a transformation 
matrix U(F) as U(l) with Cp(x) = C + x(C(F) — C). If C(F) depends C k on F, then 
because the coefficients of the ODE (D.2) depends C k on F, then so is the solution 

U(F) = exp ^ P x p(x)P x p(x)dx^j , with P x p(x) = J R((,Cp(x))d( 


and P x p(x) given by (D.l). Notice that Cp(ex) = C e p{x), and D e pC = e 1 DpC so 
P Xe p = e~ l P Xe p using a change of variable formula we get 


U (eF) = exp 


= exp 


V, P\eF( X )P\eF( X )d : 

A 

P\,f( x )P\ ,p( x )d" 


X . 


So the directional derivative of U(F) at F = 0 is 


VfU(F) 


F =0 


d 

de 


exp( ,f{ x )P\,f( x )dx ) - V P Kt p,, 

\ J 0 x / e=0 X 


E ^(®/ c )iE< A -+ E< A - »r 1 u,](®/C)p, 

A L Tj^\ 77^ A 


P x . (D.3) 


Formula (D.3) can also be presented in a more concrete matrix form through its image over 
eigenvectors. Suppose the eigenvalue decomposition of a symmetric matrix C be given as 
C = T t ET, we consider the perturbation of \H(F) = Tf/(F) 7 near F*. Let be one of 
the left eigenvectors associated with the i-th eigenvalue A;, then P\. is give by matrix TfT,;, 
where T, is T with rows not associated with A i removed. Then using ipfi. I/J Tj = we 

have 

VmA = AiPfC) - A,)- 1 ^, (D.4) 


To conclude, we have shown that 
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Lemma Appendix D.2. Let C(F) € M. dxd be a symmetric semi positive definite matrix that 
depends C 1 upon F e M™, then around any point F*, there is an orthogonal transformation 
map U(F) that depends C 1 upon F with directional derivative given by (D.3) or matrix 
representation (D.4). It characterizes the perturbation of eigen-projection by 

U (F) t P\(F)U(F) = P\(F*). 

In particular, when X is a simple eigenvalue of C(F*), U(F) transforms the eigenvector of 
C(F*) associated with X to an eigenvector of C(F) associated with the perturbed X. 
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